closeness testing
- North America > United States > District of Columbia > Washington (0.05)
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- (6 more...)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- Europe > Austria > Vienna (0.14)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- (17 more...)
- North America > United States > District of Columbia > Washington (0.05)
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- (6 more...)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- Europe > Austria > Vienna (0.14)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- (16 more...)
- North America > United States (0.04)
- Europe > Czechia > Prague (0.04)
- Asia > Middle East > Israel (0.04)
- Africa > Sudan (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Asia > Afghanistan > Parwan Province > Charikar (0.04)
Replicable Distribution Testing
Diakonikolas, Ilias, Gao, Jingyi, Kane, Daniel, Liu, Sihan, Ye, Christopher
We initiate a systematic investigation of distribution testing in the framework of algorithmic replicability. Specifically, given independent samples from a collection of probability distributions, the goal is to characterize the sample complexity of replicably testing natural properties of the underlying distributions. On the algorithmic front, we develop new replicable algorithms for testing closeness and independence of discrete distributions. On the lower bound front, we develop a new methodology for proving sample complexity lower bounds for replicable testing that may be of broader interest. As an application of our technique, we establish near-optimal sample complexity lower bounds for replicable uniformity testing -- answering an open question from prior work -- and closeness testing.
- North America > United States > California > San Diego County > San Diego (0.04)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- North America > United States > California > San Diego County > La Jolla (0.04)
- Africa > Sudan (0.04)
Better Private Distribution Testing by Leveraging Unverified Auxiliary Data
Aliakbarpour, Maryam, Burudgunte, Arnav, Cannone, Clément, Rubinfeld, Ronitt
Accurately analyzing data while preserving individual privacy is a fundamental challenge in statistical inference. Since its formulation nearly two decades ago, Differential Privacy (DP) [DMNS06] has emerged as the leading framework for privacy-preserving data analysis, providing strong mathematical privacy guarantees and gaining adoption by major entities such as the U.S. Census Bureau, Amazon [Ama24], Google [EPK14], Microsoft [DKY17], and Apple [Dif17; TVVKFSD17]. Unfortunately, DP guarantees often come at the cost of increased data requirements or computational resources, which has limited the widespread adoption of differential privacy in spite of its theoretical appeal. To address this issue, a recent line of work has investigated whether access to even small amounts of additional public data could help mitigate this loss of performance. Promising results for various tasks have been shown, both experimentally [KST20; LLHR24; BZHZK24; DORKSF24] and theoretically [BKS22; BBCKS23]. The use of additional auxiliary information is very enticing, as such access is available in many real-world applications: for example, hospitals handling sensitive patient data might leverage public datasets, records from different periods or locations, or synthetic data generated by machine learning models to improve analysis. Similarly, medical or socio-econonomic studies focusing on a minority or protected group can leverage statistical data from the overall population. However, integrating public data introduces its own challenges, as it often lacks guarantees regarding its accuracy or relevance to private datasets.
- Europe > Austria > Vienna (0.14)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- North America > United States > New Jersey > Middlesex County > New Brunswick (0.04)
- (7 more...)
- Information Technology > Security & Privacy (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Data Science > Data Mining > Big Data (0.48)